Auditory model based speech recognition in noisy environment
نویسندگان
چکیده
The main purpose of this paper is to present how to raise the speech recognition performance in noisy environment. So far the most popularly used speech feature in speech recognition is probably the so-called MFCC. The recognition rate of speech recognition algorithm using MFCC and CDHMM is known to be very high in clean speech environment, but it deteriorates greatly in noisy environment, especially in the white noisy environment. In this paper, we propose a new speech feature, the ASBF speech feature based on the mathematical model of inner ear of human auditory system. This new speech feature is extracted using both mathematical model of inner ear and primary auditory nerve processing model of human auditory system, and it can track the speech formants effectively. In the experiment, the performance of MFCC and the ASBF are compared in both clean and noisy environments when using left-to-right CDHMM with 6 states and 5 Gaussian mixtures. The experimental result shows that the ASBF is much more robust to noise than MFCC. When only 5 dimension is used in ASBF vector, the recognition rate is approximately 38.6% higher than the traditional MFCC with 39 dimension in the condition of S/N=10dB with white noise.
منابع مشابه
Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملAuditory spectrum based features (ASBF) for robust speech recognition
MFCC are features commonly used in speech recognition systems today. The recognition accuracy of systems using MFCC is known to be high in clean speech environment, but it drops greatly in noisy environment. In this paper, we propose new features called the auditory spectrum based features (ASBF) that are based on the cochlear model of the human auditory system. These new features can track the...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملSNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment
In this paper, we propose a computational auditory scene analysis (CASA)–based front–end for two–microphone speech recognition in a car environment. One of the important issues associated with CASA is the accurate estimation of mask information for target speech separation within multiple microphone noisy speech. For such a task, the time–frequency mask information is compensated through the si...
متن کامل